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Abstract 

We introduce the first class of perfect sampling algorithms for the steady-state distribution of multi¬ 
server queues with general interarrival time and service time distributions. Our algorithm is built on the 
classical dominated coupling from the past protocol. In particular, we use a coupled multi-server vacation 
system as the upper bound process and develop an algorithm to simulate the vacation system backwards 
in time from stationarity at time zero. The algorithm has finite expected termination time with mild 
moment assumptions on the interarrival time and service time distributions. 


1 Introduction 

In this paper, we present the first class of perfect sampling algorithms for the steady-state distribution of 
multi-server queues with general interarrival time and service time distributions. Our algorithm has finite 
expected running time under the assumption that the interarrival times and service times have finite 2 + e 
moment for some e > 0. 

The goal of perfect sampling is to sample without any bias from the steady-state distribution of a given 
ergodic process. The most popular perfect sampling protocol, known as Coupling From The Past (CFTP), 
was introduced by Propp and Wilson in the seminal paper H3; see also [2] for another important early 
reference on perfect simulation. 

Foss and Tweedie im proved that CFTP can be applied if and only if the underlying process is uniformly 
ergodic, which is not a property applicable to multi-server queues. So, we use a variation of the CFTP 
protocol called Dominated CFTP (DCFTP) introduced by Kendall in 116| and later extended in [T5JIII]- 

A typical implementation of DCFTP requires at least four ingredients: 

a) a stationary upper bound process for the target process, 

b) a stationary lower bound process for the target process, 

c) the ability to simulate a) and b) backwards in time (i.e. from time [— t, 0] for any t > 0) 

d) a finite time —T < 0 at which the state of the target process is determined (typically by having the 
upper and lower bounds coalesce), and the ability to reconstruct the target process from — T up to time 

0. 

The time —T is called the coalescence time and it is desirable to have E ( T) < oo. The ingredients are 
typically combined as follows. One simulates a) and b) backwards in time (by applying c)) until the processes 
meet. The target process is sandwiched between a) and b). Therefore, if we can find a time — T < 0 when 
processes a) and b) coincide, the state of the target process is known at — T as well. Then, applying d), we 
reconstruct the target process from —T up to time 0. The algorithm outputs the state of the target process 
at time 0. 

It is quite intuitive that the output of the above construction is stationary. Specifically, assume that the 
sample path of the target process, coupled with, a) and b) is given from (—oo,0], Then we can think of the 
simulation procedure in c) as simply observing or unveiling the paths of a) and b) during [—t, 0]. When 
we find a time — T < 0 at which the paths of a) and b) take the same value, because of the sandwiching 
property, the target process must share this common value at — T. Starting from that point, property d) 
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simply unveils the path of the target process. Since this path has been coming from the infinite distant past 
(we simply observed it from time —T), the output is stationary at time 0. 

One can often improve the performance of a DCFTP protocol if the underlying target process is monotone 
m, as in the multi-server queue setting. A process is monotone if there exists a certain partial order, A, 
such that if w and w' are initial states where w A w', and one uses common random numbers to simulate 
two paths, one starting from w and the other from w ', then the order is preserved when comparing the states 
of these two paths at any point in time. Thus, instead of using the bounds a) and b) directly to detect 
coalescence, one could apply monotonicity to detect coalescence as follows. At any time —t < 0, we can start 
two paths of the target process, one from the state w' obtained from the upper bound a) observed at time 
—t, and the other from the state w A w' obtained from the lower bound b) observed at time —t. Then we run 
these two paths using the common random numbers, which are consistent with the backwards simulation of 
a) and b), in reverse order according to the dynamics of the target process, and check if these two paths meet 
before time zero. If they do, the coalescence occurs at such meeting time. We also notice that because we 
are using common random numbers and system dynamics, these two paths will merge into a single path from 
the coalescence time forward, and the state at time zero will be the desired stationary draw. If coalescence 
does not occur, then one can simply let t <— 2 1, and repeat the above procedure. For this iterative search 
procedure, we must show that the search terminates in finite time. 

While the DCFTP protocol is relatively easy to understand, its application is not straightforward. In 
most applications, the most difficult part has to do with element c). Then, there is the issue of finding good 
bounding processes (elements a) and b)), in the sense of having short coalescence times - which we interpret 
as making sure that E (T) < oo. There has been a substantial amount of research which develops generic 
algorithms for Markov chains (see for example (9j and CD- These methods rely on having access to the 
transition kernels which is difficult to obtain in our case. Perfect simulation for queueing systems has also 
received significant amount of attention in recent years, though most perfect simulation algorithms for queues 
impose Poisson assumptions on the arrival process. Sigman mm applied the DCFTP and regenerative 
idea to develop perfect sampling algorithms for stable M/G/c queues. The algorithm in [T^j requires the 
system to be super-stable (i.e. the system can be dominated by a stable M/G/l with). The algorithm in 
[20| works under natural stability conditions, but it has infinite expected termination time. A recent work 
by Connor and Kendall [5] extends Sigman’s algorithm [2U] to sample stationary M/G/c queues and the 
algorithm has finite expected termination time, but they still require the arrivals to be Poisson. The main 
reason for the Poisson arrival assumption is that under this assumption one can find dominating systems 
which are quasi-reversible (see Chapter 3 of DEO) and therefore can be simulated backwards in time using 
standard Markov chain constructions (element c)). 

For general renewal arrival process, our work is close in the spirit to ma, i, a and [5], but the model 
treated is fundamentally different Thus, it requires some new developments. We also use a different coupling 
construction to that introduced in [ 20] and refined in [5]. In particular, we take advantage of a vacation 
system which allows us to transform the problem into simulating the running infinite horizon maximum 
(from time t to infinity) of renewal processes, compensated with a negative drift so that the infinite horizon 
maximum is well defined. Finally, we note that a significant advantage of our method, in contrast to |20j 
is that we do not need to empty the system in order to achieve coalescence. This is important in many 
server queues in heavy traffic for which it would take an exponential amount of time (in the arrival rate) or 
sometimes impossible to observe an empty system. 

The rest of the paper is organized as follows. In Section [2] we describe our simulation strategy, involving 
elements a) to d), and we conclude the section with the statement of a result which summarizes our main 
contribution (Theorem[l]). Subsequent sections (Section [3j[4]&[5J provide more detailed justification for our 
simulation strategy. Lastly we conduct numerical experiments in Section [6] An online companion of this 
paper includes a Matlab implementation of the algorithm. 

2 Simulation Strategy and Main Result 

Our target process is the stationary process generated by a multi-server queue with independent and iden¬ 
tically distributed (iid) interarrival times and iid service times which are independent of the arrivals. There 
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are c > 1 identical servers, each can serve at most one customer at a time. Customers are served on a 
first-come-first-served (FCFS) basis. Let G(-) and G(-) = 1 — G(-) (resp. F(-) and F(-) = 1 — F(-)) denote 
the cumulative distribution function, CDF, and the tail CDF of the interarrival times (resp. service times). 
We shall use A to denote a random variable with CDF G, and V to denote a random variable with CDF F. 

Assumption 1 (Al). Both A and V are strictly positive with probability one and there exists e > 0, such 
that 

E[A 2+e ] < oo, E[V 2+e ] < oo. 

The previous assumption will allow us to conclude that the coalescence time of our algorithm has finite 
expectation. The algorithm will terminate with probability one if E[A 1+<L ] + E\V 1+<i } < oo. The assumption 
of A and V being strictly positive can be done without loss of generality because one can always reparametrize 
the input in cases where either A or V has an atom at zero. 

We assume that G(-) and F (•) are known so that the required parameters in Section 3.1.1 of [5] can be 
obtained. We write A = (/ 0 °° G(t)dt) -1 = 1/E[A\ as the arrival rate, and /x = (J 0 °° F(t)dt) -1 = 1 /E[V] as 
the service rate. In order to ensure the existence of the stationary distribution of the system, we require the 
following stability condition A/(c/i) < 1. 


2.1 Elements of the simulation strategy: upper bound and coupling 

We first introduce some additional notations which we shall use to describe the upper bound a) in the 
application of the DCFTP framework. Let 

T° := {T° : n £ Z\{0}} 


be a time-stationary renewal point process with T° > 0 if n > 1 and T_ n < 0 if n > 1 (the T°’s are sorted in 
a non-decreasing order in n). The time T® for n > 1 represents the arrival time of the n-th customer into the 
system after time zero and, for n > 1, T° n is the arrival time of the n-th customer, backwards in time, from 
time zero. We also define T°’ + = infjT^ : T ,® > T°}, that is, the arrival time of the next customer after T°. 
If n > 1 or n < —2, T° ,+ = T° +1 . However, = T°. Similarly, we write = sup{T^ : < T°}. 

Define A n = T° ,+ — for all n £ Z\{0}. Note that A n is the interarrival time between the customer 
arriving at time T® and the next customer. A n has CDF G(-) for n > 1 or n < —2, but A_i has a different 
distribution due the inspection paradox. 

Now, for i £ {1, 2,..., c} we introduce iid time-stationary renewal point processes 


r := {T n : n £ Z\{0}}, 


as before we have that T 1 n > 0 for n > 1 and TL n < 0 if n > 1 with the T^’s sorted in a non-decreasing order. 
We also define T^+ = inf > T^} and T*>~ = sup {T m : < T*}. Then we let = T^+ - T l n . We 

assume that V£ has CDF F (•) for n > 1 and n < —2. As we shall explain, the V^’s are activities which are 
executed by the z-tli server in the upper bound process. 

Next, we define, for each i £ {0,1,..., c}, and any u £ (— oo, oo), the counting process 

K(t) := \[u,u + t\r\T i \ , 

for t > 0, where | | denotes cardinality. Note that as TL X < 0 < T{ by stationary, Nq (0) = 0. For simplicity 
in the notation let us write N l (t) = Nq (t) if t > 0 and N l ( t ) = NJ: (— t) if t < 0. 

The quantity N® (t) is the number of customers who arrive during the time interval [it, u + t]. In the 
upper bound process, each of the c servers performs two types of activities: services and vacations. (t) is 
the number of activities initiated by server i during the time interval [it, u + t\. 


2.1.1 The upper bound process 

We shall refer to the upper bound process as the vacation system, for reasons which will become apparent. 
Let us explain first in words how does the vacation system operate. Customers arrive to the vacation system 
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according to T°, and the system operates similarly to a GI/GI/c queue, except that, every time a server 
(say server i*) finishes an activity (i.e. service or a vacation), if there is no customer waiting to be served in 
the queue, server i* takes a vacation which has the same distribution as the service time distribution; if there 
is at least one customer waiting, such customer starts to be served by server i*. Similar vacation models 
have been used in m and [12- 

More precisely, let Q v (t) denote the number of people waiting in queue at time t in the stationary vacation 
system. We write Q v (t-) := lim s -j- t Q v ( s ) and dQ v (t) := Q v (t ) — Q v {t_). Also, for for any t > 0, i £ {0,..., c} 
and each u £ (—oo, oo), define 

h-[0 

and let d/V* (t) := N^(t) — N^(t-) for all t > 0 (note that as iV* (0_) = 0, dlV* (0) = IV* (0_)). Similarly, for 
We also introduce X u (t) := N°(t) — YTi =l -WJ(t). Then the dynamics of (Q v (t) : t > 0) satisfy 

C 

dQ v C t ) = dX 0 (t) + I (Q v (t-) = 0) ^2 dN o(t)’ (!) 

*=l 

given Q v (0). Note that here we are using the fact that arrivals do not occur at the same time as the start 
of activity times; this is because the processes T l are time stationary (and independent) renewal processes 
in continuous time so that T1 1 and T[ have a density. 

It follows from standard arguments for Skorokhod mapping j6i that for t > 0 

Q v (t) = Q v (0) + X 0 (t) - inf f(Xo(a) + Q„(0))”'), 

where (A 0 (s) + Qt,(0)) _ = min (X 0 ( s ) + Q„(0), 0). Moreover, using Lyons construction we have that t > 0 

Qv(—t) = sup A'_ s (0) — X_ t (0) (2) 

S>t 

(see, for example, Proposition 1 of [3]). ( Q v (t ) : t £ (— 00 , 00 )) is a well defined process by virtue of the 
stability condition A/(//c) < 1. 

The vacation system and the target process (the GI / GI/c queue) will be coupled by using the same arrival 
stream of customers, 7 -0 , and assuming that each customer brings his own service time. In particular, the 
evolution of the underlying GI/GI /c queue is described using a sequence of the form ((T°, V n ) : n £ Z\{0}), 
where V n is the service requirement of the customer arriving at time T°. The V^s must be extracted from 
the evolution of Q v (•) so that the same service times are matched to the common arrival stream both in the 
vacation system and in the target process. 

2.1.2 The coupling: extracting service times for each costumer 

In order to match the service times corresponding to each of the arriving customers in the vacation system 
we define the following auxiliary processes. For every i £ {1,..., c}, any t > 0, and each u £ (— 00 , 00 ), let 
<j l u ( t) denote the number of service initiations by server i during the time interval [it, u + t\. Observe that 

<(t)= [ I(Q v (s_)>0)dNi(s). 

J [u,u-\-t] 

That is, we count service initiations which start at time Tf, £ [it, u + t\ if and only if Q v (T£_) > 0. Once 
again, here we use that arrival times and activity initiation times do not occur simultaneously. 

We now explain how to match the service time of the customer arriving at T%. First, such customer 
occupies position Q v (T°) > 1 when he enters the queue. Let be the delay (or waiting time) inside the 
queue of the customer arriving at T°, then we have that 

C 

D° n = inf{t > 0 :Q V (T„°) = £ (i)}, 

2=1 
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and therefore, 

C 

K = £ K^to+doj ■ dN* (T° + D° n ) . (3) 

2—1 


Observe that the previous equation is valid because there is a unique i (n) € {1,c} for which dN 1 ^ (Tjj + D/j) = 
1 and dW (T° + D°) = 0 if j ± i (n) (ties are not possible because of the time stationarity of the T's), so 
we obtain that @ is equivalent to 


V n = V, 


i(n) 




We shall explain in the appendix to this section, that (V n ■ n £ Z\{0}) and (T° : n £ Z\{0}) are two inde¬ 
pendent sequences and the V n 's are iid copies of V. 


2.2 A family of GI/GI/c queues and the target GI/GI/c stationary system 

We now describe the evolution of a family of standard GI/GI/c queues. Once we have the sequence 
({T°,V n ) : n £ Z\{0}) we can proceed to construct a family of continuous-time Markov processes ( Z u (t ; z) : t > 0) 
for each u £ (—oo, oo), given the initial condition Z u (0; z ) = z. We write z = (q, r, e), and set 

Z u (t; z) ■.= ( Q u (t; z ), R u (t; z ), E u (t; z )), 

for t > 0, where Q u ( t ; z) is the number of people in the queue at time u + t (Q u (0; z) = q), R u {t\ z) is the 
vector of ordered (ascending) remaining service times of the c servers at u + t (R u ( 0; z) = r ), and E u (t\ z) is 
the time elapsed since the previous arrival at u + t (E u ( 0; z) = e). 

We shall always use E u (0-,z) = e = u — sup{T° : T° < u} and we shall select q and r appropriately 
based on the upper bound. The evolution of the process (Z u (s; z) : 0 < s < t) is obtained by feeding the 
traffic {(T°,K) : u < T° < u + s} for s £ (0, t] into a FCFS GI/GI/c queue with initial conditions given 
by z. Constructing (Z u (s; z) : 0 < s < t) using the traffic trace {(T°, V n ) : u < T® < u + s} for s £ (0, t] is 
standard (see for example Chapter 3 of USD- 

One can further describe the evolution of the underlying GI/GI/c at arrival epochs, using the Kiefer- 
Wolfowitz vector [Tj. In particular, for every non-negative vector w £ R c , such that < uA +1 ) (where 
is the i-th entry of w), and each k £ Z\{0} the family of processes {W4 (T%;w) : n > k,n £ Z\{0}} 
satisfies 

W k (T„°'+; w)=S [{W k (T n °; w) + - A„l) + ) , (4) 

W k {/T k ] w) = w. 

where ei = (0,0,..., 1) T £ M c , 1 = (1,..., 1) T £ M c , and S is the sorting operator which arranges the entries 
in a vector in ascending order. In simple words, W k (T®;w) for k > 1 describes the Kiefer-Wolfowitz vector 
as observed by the customer arriving at T°, assuming that customer who arrived at T°, k < n, experienced 
the Kiefer-Wolfowitz state w. 

Recall that the first entry of W k (T°, w) , namely (T° , w) , is the waiting time of the customer arriving 
at T° (given the initial condition w at Tj/). More generally, the *-th entry of W k (T°;w) , namely, (T°; w) , 
is the virtual waiting time of the customer arriving at if he decided to enter service immediately after 
there are at least i servers free once he reaches the head of the line. In other words, one can also interpret 
W k (T n ° ; w) as the remaining vector of workloads (sorted in ascending order) that would be processed by each 
of the c servers at T°, if no more arrivals got into the system after time Tj/. 

We are now ready to construct the stationary version of the GI/GI/c queue. Namely, for each n £ Z\{0} 
and every t £ (—oo, oo) we define W (n) and Z (t) via 

W (n) := lim W4(T n °;0), (5) 

k — y—oo 

Z (t) := (Q (t) , R (t) , E (t)) = lim Z u (t-u,Z-), 

'LL—y —OO 

where z_ = (0, 0, e). 

We shall show in Proposition [I] that these limits are well defined. 
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2.3 The analogue of the Kiefer-Wolfowitz process for the upper bound system 


In order to complete the coupling strategy we also describe the evolution of the analog Kiefer-Wolfowitz vector 
induced by the vacation system, which we denote by ( W v (T°) : n £ Z\{0}), where v stands for vacation. As 
with the i- th entry of the Kiefer-Wolfowitz vector of a GI/GI/c queue, the i-th entry of W v (T°), namely 

(r°), is the virtual waiting time of the customer arriving at time if he decided to enter service 
immediately after there are at least i servers free once he reaches the head of the line (assuming that servers 
become idle once they see, after the completion of a current activity, the customer in question waiting in the 
head of the line). 

To describe the Kiefer-Wolfowitz vector induced by the vacation system precisely, let U l ( t ) be the time 
until the next renewal after time t in T l , that is U l (t) = inf{T^ : > t} — t. So, for example, U° (T°) = A n 

for n> 1. Let U ( t ) = ( U 1 ( t ) ,..., U c (t)) T . We then have that 

W v (T°) = D° n l+S (u ((T° + £>")_)) . (6) 

In particular, note that Wj, '(T°) = D®. 

Actually, in order to draw a closer connection to the Kiefer-Wolfowitz vector recursion of a standard 
GI/GI/c queue, let us write 


S(u((7% + D° n )_)) = (t/W ((TZ + D° n )U& (( T° + D 0 n )_) T , 

and suppose that U M + D//j ^ (i.e. ji{n) is the server whose remaining 

activity time right before is the i -th smallest in order). Then, define 

W v (T°) = W v (T°) + V nei -A n 1, 

and let Wv' > (T°) to be the i-th entry of W v (T°). It is not difhcult to see from the definition of W v (T°) 
that 

(T„°>+) = 5 (T n °)) + + , 

where 

= I(W® (T°) < 0) • U j *W ((T°-+ + D° n +)_) , and 
D° n ’+ = inf {D° k : D° k > D°J. 

So, Q actually satisfies 

W v (T„°'+) = 5 ((W„ (T„°) + y„ ei - A n l) + +3„) , (7) 

where S n = (e£\ ..., S^ c) ^ . 


2.4 Description of simulation strategy and main result 

We now describe how the variation of DCFTP that we described in the Introduction, using the monotonicity 
of the multiserver queue, and elements a) to d) apply to our setting. 

The upper bound is initialized using the Kiefer-Wolfowitz process associated to the vacation system. For 
the lower bound, we shall simply pick the null vector. 

The strategy combines the following facts (which we shall discuss in the sequel). 

• Fact I: We can simulate sup s>t X_ s (0), (N l _ t (0) : 1 < i < c ), (Nq (t) : 1 < i < cj, jointly for any given 
t > 0. This part, which corresponds to item c), is executed using an algorithm from [3j designed to 
sample the infinite horizon running time maximum of a random walk with negative drift. We shall 
explain in Section [5] how the algorithm in [5j can be easily modified to sample sup s>t X_ s (0) jointly 
with (Ad s (0) : s > 0)? =0 . 
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• Fact II: For all k < —1 and every k < n < — 1 we have that 

W k (T n °; 0) < W (n) < W k (T n °; W v ( T °)) . 

This portion exploits the upper bound a) (i.e. W v (T°)), and the lower bound b) (i.e. 0). 

• Fact III: We can detect that coalescence occurs at some time —T £ [T°, 0] for some n < —1 by finding 

neZ_,n>fi, such that (T°; W v (T°)) < 0 and 

W k (T^W v (T° k ))=W k (T^0). 

This portion is precisely the coalescence detection strategy which uses monotonicity of the Kiefer- 
Wolfowitz vector. 

• Fact IV: We can combine Facts I-III to conclude that 

Z t o(\T° k \-,Q(T°_ k ),S{U(T o _ k )),0) = Z(0) (8) 

is stationary. And we also have that 

W K (T°;0) = W (1) 

follows the stationary distribution of the Kiefer-Wolfowitz vector of a GI/GI/c queue. 

The main result of this paper is the following. 

Theorem 1. Assume (Al) is in force, with X/(fic ) £ (0,1). Then Facts I-IV hold true and © is a 
stationary sample of the state of the multi-server system, we can detect coalescence at a time —T < 0 such 
that E (T) < oo. 

The rest of the paper is dedicated to the proof of Theorem 1. In Section [3] we verify a number of 
monotonicity properties which in particular allows us to conclude that the construction of W (n) and Z (t) 
is legitimate (i.e. that the limits exist almost surely). This monotonicity properties also yield Fact II and 
pave the way to verify Fact III. Section [4] proves the finite expectation of the coalescence time. In Section [5] 
we give more details as how to carry out the simulation of the upper bound process. But before going over 
Facts, we conclude this section, as we promised, arguing that the 14,’s are iid and independent of the arrival 
sequence T°. 

2.5 Appendix: The iid property of the coupled service times and independence 
of the arrival process 

In order to explain why the V n ’s form an iid sequence, independent of the sequence T° = {T° : n € Z\{0}}, 
it is useful to keep in mind the diagram depicted in Figure [TJ which illustrates a case involving two servers, 
c= 2. 

The assignment of the service times, as we shall explain, can be thought of as a procedure similar to a 
tetris game. Arrival times are depicted by dotted horizontal lines which go from left to right starting at the 
left most vertical line, which is labeled “Arrivals”. Think of the time line going, vertically, from the bottom 
of the graph (past) to the top of the graph (future). 

In the right most column in Figure [I] we indicate the queue length, right at the time of a depicted arrival 
(and thus, including the arrival itself). So, for example, the first arrival depicted in Figure |T] observes one 
customer waiting and thus, including the arrival himself, there are two customers waiting in queue. 

The tetris configuration observed by an arrival at time T is comprised of two parts: i) the receding horizon, 
which corresponds to the remaining incomplete blocks, and ii) the landscape, comprised of the configuration 
of complete blocks. So, for example, the first arrival in Figure [I] observes a receding horizon corresponding 
to the two white remaining blocks which start from the dotted line at the bottom. The landscape can be 
parameterized by a sequence of block sizes and the order of the sequence is given by the way in which the 
complete blocks appear from bottom to top - this is precisely the tetris-game assignment. There are no 


Queue length 
at arrival 



Figure 1: Matching Procedure of Service Times to Arrival Process 

ties because of the continuous time stationarity and independence of the underlying renewal processes. The 
colors are, for the moment, not part of the landscape. We will explain the meaning of the colors momentarily. 

The assignment of the service times is done as follows. The arriving customer reads off the right-most 
column (with heading ” Queue length at arrival”) and selects the block size labeled precisely with the number 
indicated by the ’’Queue length at arrival”. So, there are two distinctive quantities to keep in mind assigned 
to each player (i.e. arriving customer): a) the landscape (or landscape sequence, which, as indicated can be 
used to reconstruct the landscape), and b) the service time , which is the complete block size occupying the 
’’Queue length at arrival”-th position in the landscape sequence. 

The color code in Figure [l] simply illustrates quantity b) for each of the arrivals. So, for example, the 
first arrival, who reads ’’Queue length at arrival = 2” (which we have written in green color), gets assigned 
the second complete block, which we have depicted in green. Similarly, the second arrival depicted, reads off 
the number ”1” (written in red) and gets assigned the first red block depicted (from bottom to top). The 
very first complete block (from bottom to top), which is depicted in black, corresponds to the service time 
assigned to the customer ahead of the customer who collected the green block. The number ” 1” (in red) is 
obtained by observed that the customer with the initial black block has departed. 

Now we argue the following properties: 

1) The service times are iid copies of V. 

2) The service times are independent of T°. 

About property 1): The player arriving at time T, reads a number, corresponding to the queue length, 
which is obtained by the past filtration Ft generated by Ufc e z\/ovo <i< c {Tl : TJ: < T}. Conditional on 
the receding horizon (i.e. remaining incomplete block sizes), 7 Zt, the past filtration is independent of the 
landscape. This is simply the Markov property applied to the forward residual life time process of each of the 
c renewal processes represented by the c middle columns. Moreover, conditional on TZt, each landscape forms 
a sequence of iid copies of V because of the structure of the underlying c renewal processes corresponding to 
the middle columns. So, let Q (T) denote the queue length T (including the arrival at time T), which is a 
function of the past filtration, and let {Lt (k) : k > 1} be the landscape sequence observed at time T, so that 
Lt (Q (T)) is the service time of the customer who arrives at time T. We then have that for any positive 
and bounded continuous function / (•) 

E[f {L t (Q (T))) |7^ T ] = E[f {L T (1)) \K T ] = E[f {V) \U T \, 
precisely because conditional on 7 Zt, Q (T) (being Ft measurable) is independent of Lt- 
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To verify the iid property, let /i, f 2 be non-negative and bounded continuous functions. And assume that 
T) < T 2 are arrival times in T° (not necessarily consecutive). Then, 

E{fi{LTAQ(Ti)))f2(LTAQ(T a m 

= E[E[f x (L Tl (Q (Ti))) / 2 (i T2 (Q (T 2 )))|J- T2 ,^t 2 ]] 

= E[h (L Tl (Q (Ti))) E[f 2 (L T2 (Q (T 2 )))\Ft 2 ,TI T2 ]} 

= E[fi(L Tl (Q {T 1 ))))E[f 2 (V)} = E\h{V)]E[f 2 {V)]. 

The same argument extends to any subset of arrival times and thus the iid property follows. 

About property 2): Note that in the calculations involving property 1), the actual values of the arrival 
times T, and T\ and T 2 are irrelevant. The iid property of the service times is established path-by-path 
conditional on the observed realization 7~°. Thus the independence of the arrival process and service times 
follows immediately. 


3 Monotonicity Properties and Stationary GI/GI/c System 

This section we will present several lemmas which contain useful monotonicity properties. The proofs of the 
lemmas are given at the end of this section in order to quickly arrive to the main point of this section, which 
is the construction of a stationary version of the GI / GI/ c queue. 

First we recall that the Kiefer-Wolfowitz vector of GI/GI/c queue is monotone in the initial condition 
and invoke a property (|10|) which will allows us to construct a stationary version of the Kiefer-Wolfowitz 


vector of our underlying GI/GI /c queue, using Lyons construction. 

Lemma 1. For n > k, k,n € Z\{0}, w + > w~, 


W k (T°-,w+)>W k (T^w~). 

(9) 

Moreover, if k < k' < n 


W k (T„°; 0) > W k> (T n °;0) . 

(10) 


The second result allows to make precise a sense in which the vacation system dominates a suitable family 
of GI/GI/c systems, in terms of the underlying Kiefer-Wolfowitz vectors. 

Lemma 2. For n> k, k,n € Z\{0}, 


w v (T°)>w k (T° n ;W v (r°)). 

The next result shows that in terms of queue length processes, the vacation system also dominates a 
family of GI/GI/c queue, which we shall use to construct the upper bounds. 

Lemma 3. Let q = Q v {u), r = S (U ( u )), and e = u — sup{T° : < u}, so that z + = ( q,r,e ) and 

z~ = (0,0, e) then for t > u 

U, Z ) f Quif U, Z ) ^ 

Using Lemmas m and [3] we can establish the following result. 

Proposition 1. The limits defining W ( n ) and Z ( t ) in © exist almost surely. Moreover, we have that Fact 
II holds. 


Proof of Proposition [7} Using Lemma [2] and Lemma [T] we have that 

W v (T n °) > W k (T n °; W v (T°)) > W k (T°; 0) . 

So, by property (10) in Lemma [l] we conclude that the limit defining W ( n ) exists almost surely and that 

W{n)<W v (T° n ). (11) 
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Similarly, using Lemma || we can obtain the existence of the limit Q(t) and we have that Q (t) < Q v ( t ). 
Moreover, by convergence of the Kiefer-Wolfowitz vectors we obtain the Tth entry of R{T° + W^\n)), 

namely, (T° + (n)) = (W« ( n ) — (n)) + , where i G {1,..., c}. Clearly, since the age process 

has been taken underlying T°, we have that E (t) = t — sup{T° : < t}. The fact that the limits are 

stationary follows directly from the limiting procedure and it is standard in Lyons-type constructions. For 
Fact II, we use the identity W (n) = W k (T%;W (A;)), combined with Lemma [I] to obtain 

W k (T°; 0) < W k (T°; W (k)) = W (n ), 

and then we apply Lemma |2j together with ( |11[ ), to obtain 

W ( n ) = W k (T°; W (k)) < W k (T°; W v (T°)) . 

The previous two inequalities are precisely the statement of Fact II. f3; 


3.1 Proof of technical Lemmas 

Proof of Lemma[ 7] Both facts are standard, the first one can be easily shown using induction. Specifically, 
we first notice that W k {T®;w + ) = w + > w~ = W k (T®-,w~) Suppose that W k (T°;w + ) > W k (T®\w~) for 
some n > 0, then 


W k (T% +1 ; w+) = S (( W k (T W + ) + V n - A n ) + ) 

> 5 (iW k (T^w~) + V n - A„) + ) = W k {T° +1 -w~). 


For inequality (101, we note that W k (T°,;0) > W k ' (T°,; 0) = 0, and therefore, due to (J9J), we have that 

W k (T°; 0) = (T°; W k (k'; 0)) > W k , (T n °; 0) . 


□ 


Proof of Lemma This fact follows immediately by induction from equations Q and (J7| using the fact that 


> 0 . 


□ 


Proor of Lemma\ 3| We first prove the inequality Quit — u ; z + ) < Q v {t). Note that U l ( u ) > 0 for all u (the 
forward residual life time process is right continuous), so the initial condition r indicates that all the servers 
are busy (operating) and the initial q > 0 customers will leave the queue (i.e. enter service) at the same time in 
the vacation system as under the evolution of Z u (•; z + ). Now, let us write N = inf{n : > u} (in words, the 

next arriving customer at or after u arrives at time T ( f ). It is easy to see that S (U (T^r)) > R u {Tpf — u; z + ); 
to wit, if occurs before any of the servers becomes idle, then we have equality, and if T ( f occurs after, 
say l > 1, servers become idle, then R U (T ^ — u\z + ) will have l zeroes and the bottom c — l entries will 
coincide with those of S (U (Tjyd), which has strictly positive entries. So, if wn is the Kiefer-Wolfowitz 
vector observed by the customer arriving at T { f, (induced by Q u {- — u\z + )) 1 then we have W v \T%) > wn- 
By monotonicity of the Kiefer-Wolfowitz vector in the initial condition and because of Lemma [2] we have 

W v (T°) > W N (T°, W v (7 %)) > W N (T fc °, w N ) , 


for all k> N, and hence, T® + D^ > (T®,wn)- Therefore, the departure time from the queue (i.e. 

initiation of service) of the customer arriving at T(? in the vacation system occurs after the departure time from 
the queue of the customer arriving at time in the GI/GI/c queue. Consequently, we conclude that the set 
of customers waiting in the queue in the GI / GI /c system at time t is a subset of the set of customers waiting 
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in the queue in the vacation system at the same time. Similarly, we consider Q u (t — u; z ) < Quit — u: z + ), 
which is easier to establish, since for k > N (with the earlier definition of Ty and ivn), 

W N (T° ; w N ) > W N (T°; 0) , 

So the set of customers waiting in the queue in the lower bound GI/GI /c system at time t is a subset of the 
set of customers waiting in the upper bound GI/GI /c system at the same time. □ 


4 The coalescence detection in finite time 


In this section, we give more details about the coalescence detection scheme. The next result corresponds to 
Fact III and Fact IV. 

Proposition 2. Suppose that w + = W v (T)°) and w~ = 0. Assume that W k (T®-,w + ) = W k 

for some k < n < — 1. Then, W k (Z^ ; \w + ) = W ( m ) = Wk (Tmi w ) f or 171 > n - Moreover, for all 

t>T° + W^ {T%-w+), 

Z T o it - T° k ; Q v (T°) , 5 (U ( T° k )) , 0) = Z T o (t - T fe °; 0, 0,0) = Z it) . (12) 

Proof of Proposition The fact that 

W k (T° ; w+) = W (m) = W k (T° ; w~) 


for to > n follo ws i mmediately from the recursion defining the Kiefer-Wolfowitz vector. Now, to show the 
first equality in (12) it suffices to consider t = T° + W^ (T® ; w + ), since from t > T® the input is exactly the 

same and everyone coming after will depart the queue and enter service after + W^ (T°;ui + ). The 
arrival processes (i.e. E u (•)) clearly agree, so we just need to verify that the queue lengths and the residual 
service times agree. First, note that 


^to(T„° + Wf } (T„°; w+) - T°; Q v iT° k ) , 5 (t/ (Z fc 0 )) , 0) (13) 

= W k (T»-w + )-Wl l) (T°-,w+) 1 
= W k (TZ-,w~)-Wl 1) (T°;w~) 1 
= R t o (T„° + Wf } (T„°; w~) - T k ; 0,0, 0). 


So, the residual service times of both upper and lower bound processes agree. The agreement of the queue 
lengths follows from Lemma [3] Finally, the second equality in (12) is obtained by taking the limit in the last 
equality in (13), and the equality between queue lengths follows again from Lemma [3j □ 


Next we analyze properties of the coalescence time. Define 

T- = sup{T° < 0 : inf: [Z T o (i - T°; Q v (T°) , 5 (U (T fc 0 )) , 0) - Z T o(t - T fe °; 0,0, 0)] = 0}. 

± k — — 

By time reversibility we have that | T!_ | is equal in distribution to 

T = inf{T fe °>0: inf [Z 0 (t; Q v i0) , 5 (U (0)), 0) - Z 0 (t; 0,0, 0)] = 0}. 

0 <t<T° 

We next establish that E[T ] < oo. This result, except for the verification of Fact I, which will be discussed 
in Section [5j completes the proof of Theorem [T] 

Proposition 3. If E[V n ] < cE[A n \ for n > 1 and Assumption (Al) holds, 


E[T] < oo. 
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Proof of Proposition [3| Define 

r = inf {n > 1 : W\ (T°; W v (1)) = W, (T°; 0)} . 

By Wald’s identity, EA n < oo, for any n > 1, it suffices to show that E[t] < oo. We prove the proposition 
by considering a sequence of events which happens with positive probability and leads to the occurrence of 
r. The idea follows from the proof of Lemma 2.4 in Chapter XII of [Tj. As K[V)j] < cE[A n ], for n > 2,we can 
find m, e > 0 such that for every n > 2, the event H n = {V n < cm — e, A n > m} is nontrivial in the sense 
that P ( H n ) > 5 for some S > 0. Moreover, because V\ and Ai are independent with continuous distribution 
we also have that P (Hi) > S. Now, pick K > cm , large enough, and define 


k-\-\cK/e \ +c 

U = [w[ c) (T°; W v (1)) < K } f) H n . 

n=k 

In what follows, we first show that if 12 happens, the two bounding systems would have coalesced by the time 
of the (k + c + |~civ/e~|)-th arrival. We then provide an upper bound for E[t]. Let 

W k = W^T^Wyil)). 


For n > k, define V n = cm — e, A n = m. and the (auxiliary) Kiefer-Wolfowitz sequence 


Wn+l 


= S 


((Wn + Vnei-An l) 



Then 12 implies V n < V n and A n > A n , for n > k, which in turn implies W\ (T®;W V (1)) < W n . Moreover, 
It is easy to check that Wn^ = 0 and Wn^ < cm for n = k + \cK/e\ + 1, • • ■ , k + \cK/e\ + c. Then, 
W[ 1 ^ (T°; W v (1)) = 0 and Wj c ^ (T°; W v (1)) < cm for n = k+ \cK/e\ + 1, • • • ,k+ \cK/e] +c. This indicates 
that under f2, all the arrivals between the (k + [ cK/e) + l)-th arrival and the (k + \cK/t\ + c)-th arrival 
(included) have zero waiting time (enter service immediately upon arrival), and the customers initially seen 
by the (k + \cK/e\ + l)-th arrival would have had left the system by the time of the (k + \cK/E\ + c)-th 
arrival. The same analysis clearly holds assuming that we replace W\ (T°; W v (1)) by W\ (T(?; 0) throughout 
the previous discussion. Therefore, by the time of the (k + \cK/e\ + c)-th arrival, the two bounding systems 
would have exactly the same set of customers with exactly the same remaining service times, which is equal to 
their service times minus the time elapsed since their arrival times (since all of them start service immediately 
upon arrival). We also notice that since there is no customer waiting, the sorted remaining service time at 
Tk+ \cK/e\ +c coincide with the Kiefer-Wolfowitz vector MA+fc/c/el+c- Under Assumption (Al) and A < c/x, 
{W (T n ° \w (D) : n > 1} for any fixed initial condition w , is a positive recurrent Harris chain, [T] - in fact, 
positive recurrence follows even under the assumption of finite means. Now, again under Assumption (Al), 
we have that 


E 


Ewjdd) 


< oo, 


Therefore, it is straightforward to show, using a standard linear Lyapunov function, that for I\ large enough, 
if ki := inf{n > 1 : (T®;W V (1)) < K}, then E[k i] < oo. Moreover, for i > 2, define 


ki := {n > ki-i + \cK/e\ + c : W^ ( n ; W v (1)) < K}. 


By the positive recurrence of the Kiefer-Wolfowitz vector, we can find a constant M > 0 such that 


E[ki - ki- 1] < M. 


In this way, we can split the process into cycles (not necessarily regenerative) [k*, ki+ 1 ) for i > 1, and the initial 
period [1, hi). We denote f2j = f) n=ki K ^^ +C ^ or * = 1, 2, • • •. Since P(H n ) > <5, P(f2j) > S^ cK / eli+c+1 > 0. 




13 


Let N = inf{z >1:1 (fij = 1)} (i.e. the first i for which Qi occurs), then £?[iV] < S i\ cK /<^+ c + 1 ) < oo. By 
Wald’s identity we have (setting k 0 = 0) that 

E[t] < E[k n ] + \cK/e \ + c 

N 

= E^2(ki - hi- 1 ) + \cK/e] + c 

i=i 

< E[N] xM + E[k i] + \cK/e} + c. 


□ 

5 Fact I: Simulation of Stationary Vacation System Backwards in 
Time 

In this section, we address the validity of Fact I, namely, that we can simulate the vacation system backwards 
in time, jointly with {T^ : m < n < — l} for 1 < i < c for any m < — 1. 

Let G e (*) = A J Q G(x)dx and F e (-) = /j. J Q F(x)dx denote equilibrium CDF’s of the interarrival time 
and service time distributions respectively. We first notice that simulating the stationary arrival process 
{T° : n < — l} and stationary service/vacation completion process : n < — l} for each 1 < i < c is 
straightforward by the reversibility of 7^ for 0 < i < c. Specifically, we can simulate the renewal arrival 
process forward in time from time 0 with the first interarrival time following G e and subsequence interarrival 
times following G. We then set for all k > 1. Likewise, we can also simulate the service/vacation 

completion process of server i, for i = 1 ,... ,c, forward in time from time 0 with the first service/vacation 
completion time following F e and subsequent service/vacation requirements distributed as F. Let denote 
the fc-th service/vacation completion time of server i counting forwards (backwards) in time. Then we set 
T- k = -T l k . 

Similarly, we have the equality in distribution, for all t > 0 (jointly) 

(0) = Xq (i), 

therefore we have from ([2]) that the following equality in distribution holds for allt > 0 (jointly) 

Qv(~t) = sup X 0 (s) - X 0 (t). 


The challenge in simulating Q v (—t ), involves in sampling M(t) = max s > t {Xo(s)} jointly with Nq(t) for 
1 < * < c during any time interval of the form [0,T] for T > 0. If we can do that, then we can evaluate 

C 

x 0 (t) = jv°(i)-x;jv3(i), 

i= 1 


and, therefore, Q v (—t) = M{t) — X(t). 

In what follows we first introduce a trick to decompose the process of sampling M (t) into that of sampling 
the maximum of c + 1 independent negative drifted random walks. 

Choose a € (A, c/i). Then 


X{t) = (Ng{t) -at) + ^T (^t - N l 0 {t)^j . 

i—1 

We define (c + 1) negative drifted random walks as follows: 


^ 0) = 0, = Si 0 !, + (- aA n + 1) for n = 1,2,..., 
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and for i = 1,..., c, 


Sf = V/ 0 , = Ci + (-1 + “C) for n = 1,2,.... 

Figure [ 2 ] plots the relationship between {JVg(i) — at : t > 0} and {Sn^ : n > 0}, and the relationship 
between {"t — ATg(i) : t > 0} and { Sr!'' 1 ■ n > 0} for i = 1,..., c. From FigurejiJ it is easy to check that 

max{A^Q (s) — as} = maxjlVg (t) — at, max {£1°^}}, 
s>t n>Ng(t) + l 


and for i = 1,..., c, 


max|-s — ATg(s) j = max{-t — .ZVg(i), max {S^}}. 


n>N' 0 (t) 


Figure 2: The relationship between the renewal processes and the random walks 



(b) ( a/c)t — N^(t) and S', 


(0 


c 


r >(0 



Consequently, the algorithm to simulate M(t) = max s > t {X 0 (s)} jointly with Nq (t) for 1 < i < c during 
any time interval of the form [0,T] for T > 0 can be implemented if we can simulate Sn ' 1 jointly with 
= max{5'^’ > : k > n} for n > 1 for i = 0,1,...,c . The algorithm to generate Sn^ jointly with 
under Assumption (Al) has been developed in [3], We provide a detailed Matlab implementation of each of 
the algorithms required to execute Facts I-IV in the online appendix to this paper. 


6 Numerical experiments 

As a sanity check, we have implemented our Matlab code in the case of an Ad/M/c queue, for which the 
steady-state analysis can be performed in closed form. 

As a first step, we have compared the theoretical distribution to the empirical distribution obtained from 
a large number of runs of our perfect simulation algorithm for different sets of parameter values, and they 
are all in close agreement. As an example, Figure [3] shows the result of such test when A = 3, = 2, c = 2. 

Grey bars show the empirical result of 5, 000 draws using our perfect simulation algorithm, and black bars 
show the theoretical distribution of number of customers in system. Figure [4] provides another comparison 
with a different set of parameters. 

Next we run numerical experiments to see how the running time of our algorithm, measured by mean 
coalescence time of two bounding systems, scales as the number of servers grows and the traffic intensity p 
changes. Starting from time 0, the upper bound queue has its queue length sampled from the theoretical 
distribution of an M/M/c vacation system and all servers busy with remaining service times drew from the 
equilibrium distribution of the service/vacation time; and the lower bound queue is empty. Then we run 
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both the upper bound and lower bound queues forward in time with the same stream of arrival times and 
service requirements until they coalescence. Table [l] shows the estimated average coalescence time, E[T\, 
based on 5000 iid samples, for different system scales in the Quality driven regime (QD). We observe that 
E[T ] does not increase much as the system scale parameter, s, grows. Table [2] shows similar results for the 
Quality-and-Efhciency driven operating regime (QED). In this case, E[T\ increases at a faster rate with s 
than the QD case, but the magnitude of increment is still not significant. 

Table 1: simulation result for coalescence time of M/M/c queue 
(QD: A 3 = s,c s = 1.2s, fj, = 1) 


s 

mean 

95% confidence interval 

100 

6.4212 

[6.2902, 6.5522] 

500 

7.0641 

[6.9848, 7.1434] 

1000 

7.7465 

[7.6667, 7.8263] 


Table 2: simulation result for coalescence time of M/M/c queue 
(QED: A s = s, c s — s + 2\/s, y = 1) 


s 

mean 

95% confidence interval 

100 

6.5074 

[6.3771, 6.6377] 

500 

8.5896 

[8.4361, 8.7431] 

1000 

9.4723 

[9.3041, 9.6405] 


Finally we run a numerical experiment aiming to test how computational complexity of our algorithm 
changes with traffic intensity, p = X/cp. Here we define the computational complexity as the total number 
of renewals (including arrivals and services/vacations) the algorithm samples in total to find the coalescence 
time. We expect the complexity to scale like (c+ 1)(1 — p)~ 2 E[T(p)\ where (c+ 1) is the number of renewal 
processes we need to simulate, (1 — p)~ 2 is on average the amount of renewals we need to sample to find its 
running time maximum for each renewal process, and E[T(p )] is the mean coalescence time when the traffic 
intensity is p. Table [3] summarizes our numeral results, based 5000 independent runs of the algorithm for 
each p. We run the coalescence check at 10 x 2 k for k = 1, 2,..., until we find the coalescence. We observe 
that as p increase, the computational complexity increases significantly, but when multiplied by (1 — p) 2 , 
the resulting products are of about the same magnitude - up to a factor proportional to A, given that the 
number of arrivals scales as lambda per unit time. Therefore, the main scaling parameter for the complexity 
here is (1 — p) -2 . Notice that if we simulate the system forward in time from empty, it also took around 
O ((1 — p) -2 ) arrivals to get close to stationary. 
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